Goto

Collaborating Authors

 significant change


Multi-Dimensional Summarization Agents with Context-Aware Reasoning over Enterprise Tables

arXiv.org Artificial Intelligence

We propose a novel framework for summarizing structured enterprise data across multiple dimensions using large language model (LLM)-based agents. Traditional table-to-text models often lack the capacity to reason across hierarchical structures and context-aware deltas, which are essential in business reporting tasks. Our method introduces a multi-agent pipeline that extracts, analyzes, and summarizes multi-dimensional data using agents for slicing, variance detection, context construction, and LLM-based generation. Our results show that the proposed framework outperforms traditional approaches, achieving 83\% faithfulness to underlying data, superior coverage of significant changes, and high relevance scores (4.4/5) for decision-critical insights. The improvements are especially pronounced in categories involving subtle trade-offs, such as increased revenue due to price changes amid declining unit volumes, which competing methods either overlook or address with limited specificity. We evaluate the framework on Kaggle datasets and demonstrate significant improvements in faithfulness, relevance, and insight quality over baseline table summarization approaches.


Exploring the change in scientific readability following the release of ChatGPT

arXiv.org Artificial Intelligence

The rise and growing popularity of accessible large language models have raised questions about their impact on various aspects of life, including how scientists write and publish their research. The primary objective of this paper is to analyze a dataset consisting of all abstracts posted on arXiv.org between 2010 and June 7th, 2024, to assess the evolution of their readability and determine whether significant shifts occurred following the release of ChatGPT in November 2022 . Four standard readability for mulas are used to calculate individual readability scores for each paper, classifying their level of readability. These scores are then aggregated by year and across the eight primary categories covered by the platform. The results show a steady annual decrease in readability, suggesting that abstracts are likely becoming increasingly complex. Additionally, following the release of ChatGPT, a significant change in readability is observed for 2023 and the analyzed months of 2024. Similar trends are found acr oss categories, with most experiencing a notable change in readability during 2023 and 2024. These findings offer insights into the broader changes in readability and point to the likely influence of AI on scientific writing.


How Large Language Models Are Changing MOOC Essay Answers: A Comparison of Pre- and Post-LLM Responses

arXiv.org Artificial Intelligence

The release of ChatGPT in late 2022 caused a flurry of activity and concern in the academic and educational communities. Some see the tool's ability to generate human-like text that passes at least cursory inspections for factual accuracy ``often enough'' a golden age of information retrieval and computer-assisted learning. Some, on the other hand, worry the tool may lead to unprecedented levels of academic dishonesty and cheating. In this work, we quantify some of the effects of the emergence of Large Language Models (LLMs) on online education by analyzing a multi-year dataset of student essay responses from a free university-level MOOC on AI ethics. Our dataset includes essays submitted both before and after ChatGPT's release. We find that the launch of ChatGPT coincided with significant changes in both the length and style of student essays, mirroring observations in other contexts such as academic publishing. We also observe -- as expected based on related public discourse -- changes in prevalence of key content words related to AI and LLMs, but not necessarily the general themes or topics discussed in the student essays as identified through (dynamic) topic modeling.


Did ChatGPT or Copilot use alter the style of internet news headlines? A time series regression analysis

arXiv.org Artificial Intelligence

The release of advanced Large Language Models (LLMs) such as ChatGPT and Copilot is changing the way text is created and may influence the content that we find on the web. This study investigated whether the release of these two popular LLMs coincided with a change in writing style in headlines and links on worldwide news websites. 175 NLP features were obtained for each text in a dataset of 451 million headlines/links. An interrupted time series analysis was applied for each of the 175 NLP features to evaluate whether there were any statistically significant sustained changes after the release dates of ChatGPT and/or Copilot. There were a total of 44 features that did not appear to have any significant sustained change after the release of ChatGPT/Copilot. A total of 91 other features did show significant change with ChatGPT and/or Copilot although significance with earlier control LLM release dates (GPT-1/2/3, Gopher) removed them from consideration. This initial analysis suggests these language models may have had a limited impact on the style of individual news headlines/links, with respect to only some NLP measures.


Experience of Training a 1.7B-Parameter LLaMa Model From Scratch

arXiv.org Artificial Intelligence

Pretraining large language models is a complex endeavor influenced by multiple factors, including model architecture, data quality, training continuity, and hardware constraints. In this paper, we share insights gained from the experience of training DMaS-LLaMa-Lite, a fully open source, 1.7-billion-parameter, LLaMa-based model, on approximately 20 billion tokens of carefully curated data. We chronicle the full training trajectory, documenting how evolving validation loss levels and downstream benchmarks reflect transitions from incoherent text to fluent, contextually grounded output. Beyond pretraining, we extend our analysis to include a post-training phase focused on instruction tuning, where the model was refined to produce more contextually appropriate, user-aligned responses. We highlight practical considerations such as the importance of restoring optimizer states when resuming from checkpoints, and the impact of hardware changes on training stability and throughput. While qualitative evaluation provides an intuitive understanding of model improvements, our analysis extends to various performance benchmarks, demonstrating how high-quality data and thoughtful scaling enable competitive results with significantly fewer training tokens. By detailing these experiences and offering training logs, checkpoints, and sample outputs, we aim to guide future researchers and practitioners in refining their pretraining strategies. The training script is available on Github at https://github.com/McGill-DMaS/DMaS-LLaMa-Lite-Training-Code. The model checkpoints are available on Huggingface at https://huggingface.co/collections/McGill-DMaS/dmas-llama-lite-6761d97ba903f82341954ceb.


Do LLMs exhibit human-like response biases? A case study in survey design

arXiv.org Artificial Intelligence

As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. We investigate the extent to which LLMs reflect human response biases, if at all. We look to survey design, where human response biases caused by changes in the wordings of "prompts" have been extensively explored in social psychology literature. Drawing from these works, we design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior, particularly in models that have undergone RLHF. Furthermore, even if a model shows a significant change in the same direction as humans, we find that they are sensitive to perturbations that do not elicit significant changes in humans. These results highlight the pitfalls of using LLMs as human proxies, and underscore the need for finer-grained characterizations of model behavior. Our code, dataset, and collected samples are available at https://github.com/lindiatjuatja/BiasMonkey


The Future of AI: A quick Look Ahead - iPhoneGlance

#artificialintelligence

Artificial intelligence (AI) has been a topic of discussion and research for decades. However, it is only in recent years that we have seen significant advancements in this field. AI is now being used in various industries, including healthcare, finance, and transportation, to name a few. With these advancements and applications, it is clear that AI is here to stay and will shape the future in various ways. So, what does the future of AI look like?


AI Predictions: Who Thinks What, and Why? - by Zoltan Tapi

#artificialintelligence

As we continue to make strides in the field of Artificial Intelligence (AI), one concept that has been gaining momentum is Artificial General Intelligence (AGI). Unlike traditional AI systems, AGI aims to replicate the human-like ability to learn, reason and adapt in any given situation. In other words, AGI seeks to create a machine that can perform any intellectual task that a human can do. This level of sophistication is still far from being achieved, but experts predict that once we create AGI, it will be a major turning point in human history, with implications far beyond what we can currently imagine. In this article, we'll dive into what experts are saying about AGI and what it could mean for the future of humanity.


Leading lawmakers pitch extending scope of AI rulebook to the metaverse

#artificialintelligence

The European Parliament's co-rapporteurs DragoลŸ Tudorache and Brando Benifei circulated two new batches of compromise amendments, seen by EURACTIV, on Wednesday (28 September), ahead of the technical discussion with the other political groups on Friday. These latest batches introduce significant changes to the regulation's scope, subject matter and obligations for high-risk AI systems concerning risk management, data governance and technical documentation. A new article has been added to extend the regulation's scope to AI system operators in specific metaverse environments that meet several cumulative conditions. These criteria are that the metaverse requires an authenticated avatar, is built for interaction on a large scale, allows social interactions similar to the real world, engages in real-world financial transactions and entails health or fundamental rights risks. The scope has been expanded from AI providers to any economic operators placing an AI system on the market or putting it into service.


Opinion: Artificial intelligence can no longer be ignored - we need policies to deal with it

#artificialintelligence

WE ARE AT the start of some of the most significant changes in human history that will result from the increased use of artificial intelligence (AI) in virtually every area of life. The speed of some of these changes will be exciting or frightening, depending on your perspective. In medicine, AI is being used already to diagnose and remotely treat patients, as well as to identify potential new drugs and make prescriptions. Job applicants are shortlisted using AI and credit ratings are determined by the same technology. Data is increasingly being used to determine large swathes of public policy, from policing to transport planning. In science fiction, the cyborg is this part human, part machine where the technology can carry out many of the traditional functions at speed of the human.